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AMENDMENTS TO THE CLAIMS: 

The listing of claims will replace all prior versions, and listings of claims in the 
application: 

LISTING OF CLAIMS: 

1. (Currently amended) A method for computing a measure of similarity 
between a first (or input) document and one or more disparate (or search results) 
documents, comprising: 

(a) receiving a first document and identifying the best keywords in the text bv 
recognizing rare and uncommon keywords, including keywords that belong to one or 
more domain specific or subject matter specific dictionary: 

(b) identifying documents similar to the first document 
using a guerv bv formulating wrappers using the list of the best keywords identified in 
the first document that also appear in a PS dictionary: 

(c) receiving a first list of rated keywords extracted from the first document and a 
list of rated keywords extracted from each of the one or more disparate documents; 

(d) comparing ustn§ the first list of rated keywords and to the list of rated 
keywords from each of the one or more disparate documents to determine whether the 
first document forms part of the one or more disparate documents using a first 
computed percentage indicating what percentage of keyword ratings in the first list also 
exist in the list of at least one of the one or more disparate documents; 

(e) verifying inclusion of the first document in the one or more disparate 
documents bv computing a second percentage for each of the one or more disparate 
documents indicating what percentage of keyword ratings along with a set of their 
neighboring keyword ratings in the first list also exist in the list for at least one of the one 
or more disparate documents when the first computed percentage indicates that the first 
document is included in at least one of the one or more disparate documents; 

(fl using the first computed percentage to specify the measure of similarity when 
the computed second percentage for at least one of the one or more disparate 
documents is greater than the first computed percentage; and 
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(g) ranking the one or more disparate documents based on the percentage 
computed indicating what percentage of keyword ratings along with a set of their 
neighboring keyword ratings in the first list also exist in the list for at least one of the one 
or more disparate documents. 

2. (Original) The method according to claim 1, wherein the second 
percentage at (c) is computed by giving weight only to those keywords and their set of 
neighboring keywords in the first list that match in the second list and a threshold 
percentage of the keywords in their set of neighboring keywords. 

3. (Original) The method according to claim 2, wherein the second 
percentage at (c) is computed by giving full weight to those keywords in the first list of 
rated keywords that cannot be accurately identified as having a complete set of 
neighboring keywords in the second set of keywords. 

4. (Original) The method according to claim 2, wherein the threshold 
percentage is reduced when the first list of rated keywords is identified using OCR. 

5. (Previously Presented) The method according to claim 1, further 
comprising (f) if the first computed percentage does not indicate that the first document 
is included in the second document, computing a third percentage using the Jaccard 
distance measure. 

6. (Previously Presented) The method according to claim 5, further 
comprising (g) if the third computed percentage indicates that the first document is a 
revision of the second document, computing a fourth percentage indicating what 
percentage of keyword ratings along with a set of their neighboring keyword ratings in 
the second list also exist in the first list. 

7. (Original) The method according to claim 6, further comprising using the 
fourth computed percentage to specify the measure of similarity except when: (i) the 
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fourth computed percentage is greater than the second computed percentage; (ii) the 
first list of rated keywords is identified using OCR; (iii) the fourth computed percentage 
is greater than fifty percent; and (iv) less than twenty percent of the keywords in the first 
list of keywords are in the second list of keywords. 

8. (Original) The method according to claim 1, wherein the first computed 
percentage indicates that the first document is included in the second document when 
the percentage defined by ratio of Sum1/Sum2 is greater than approximately ninety 
percent, where: D1 is the number of keywords in first list of keywords; D2 is the number 
of keywords in the second list of keywords; Sum1 is the sum of the weights of keywords 
that appear in D1 that also appear in D2; Sum2 is the sum of the weights of keywords in 
D1. 

9. (Original) The method according to claim 1 , wherein the first list of rated 
keywords includes one or more keywords translated from a second language different 
from a first language that is identified as being a primary language of the first document. 

10. (Original) The method according to claim 1, wherein the first document is 
a portion of the second document. 

1 1 . (Currently amended) A computer-based system for computing a measure 
of similarity between a first (or input) document and one or more (or search results) 
documents, comprising: 

(a) means for receiving a first document and identifying the best keywords in the 
text bv recognizing rare and uncommon keywords, including keywords that belong to 
one or more domain specific or subject matter specific dictionary; 

(b) means for identifying documents similar to the first document 
using a guerv bv formulating wrappers using the list of the best keywords identified in 
the first document that also appear in a PS dictionary; 
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(c) means for receiving a first list of rated keywords extracted from the first 
document and a list of rated keywords extracted from each of the one or more disparate 
documents, wherein keywords are rated at least in part by a relevant weight from their 
associated document language; 

(d) means for using comparing the first list of rated keywords af>d tojhe list of 
keywords from each of the one or more disparate documents to determine whether the 
first document forms part of the one or more disparate documents using a first 
computed percentage indicating what percentage of keyword ratings in the first list also 
exist in the list of at least one of the one or more disparate documents; 

(e) means for verifying inclusion of the first document in the one or more 
disparate documents bv computing a second percentage for each of the one or more 
disparate documents indicating what percentage of keyword ratings along with a set of 
their neighboring keyword ratings in the first list also exist in the list for at least one of 
the one or more disparate documents when the first computed percentage indicates that 
the first document is included in at least one of the one or more disparate documents; 
and 

(f) means for using the first computed percentage to specify the measure of 
similarity when the computed percentage for at least one of the one or more disparate 
documents is greater than the first computed percentage; and 

(g) means for ranking the one or more disparate documents based on the 
percentage computed indicating what percentage of keyword ratings along with a set of 
their neighboring keyword ratings in the first list also exist in the list for at least one of 
the one or more disparate documents. 

12. (Original) The system according to claim 11, wherein the second 
percentage at (c) is computed by said computing means by giving weight only to those 
keywords and their set of neighboring keywords in the first list that match in the second 
list and a threshold percentage of the keywords in their set of neighboring keywords. 

13. (Original) The system according to claim 12, wherein the second 
percentage at (c) is computed by said computing means by giving full weight to those 
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keywords in the first list of rated keywords that cannot be accurately identified as having 
a complete set of neighboring keywords in the second set of keywords. 

14. (Original) The system according to claim 12, wherein the threshold 
percentage is reduced when the first list of rated keywords is identified using OCR. 

15. (Previously Presented) The system according to claim 11, further 
comprising 

(f) if the first computed percentage does not indicate that the first document is 
included in the second document, means computes a third percentage using the 
Jaccard distance measure. 

16. (Original) The system according to claim 15, further comprising (f) if the 
third computed percentage indicates that the first document is a revision of the second 
document, means computes a fourth percentage indicating what percentage of keyword 
ratings along with a set of their neighboring keyword ratings in the second list also exist 
in the first list. 

17. (Original) The system according to claim 16, further comprising means for 
using the fourth computed percentage to specify the measure of similarity except when: 
(i) the fourth computed percentage is greater than the second computed percentage; (ii) 
the first list of rated keywords is identified using OCR; (iii) the fourth computed 
percentage is greater than fifty percent; and (iv) less than twenty percent of the 
keywords in the first list of keywords are in the second list of keywords. 

18. (Original) The system according to claim 11, wherein the first computed 
percentage indicates that the first document is included in the second document when 
the percentage defined by ratio of Sum1/Sum2 is greater than approximately ninety 
percent, where: D1 is the number of keywords in first list of keywords; D2 is the number 
of keywords in the second list of keywords; Sum1 is the sum of the weights of keywords 
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that appear in D1 that also appear in D2; Sum2 is the sum of the weights of keywords in 
D1. 

19. (Original) The system according to claim 11, wherein the first list of rated 
keywords includes one or more keywords translated from a second language different 
from a first language that is identified as being a primary language of the first document. 

20. (Currently Amended) An article of manufacture for computing a measure 
of similarity between a first (or input) document and a s e cond one or more disparate (or 
search results) documents, the article of manufacture comprising computer usable 
media including computer readable instructions embedded therein that causes a 
computer to perform a method, wherein the method comprises: 

(a) receiving a first document and identifying the best keywords in the text bv 
recognizing rare and uncommon keywords, including keywords that belong to one or 
more domain specific or subject matter specific dictionary; 

(b) identifying documents similar to the first document 
using a guerv bv formulating wrappers using the list of the best keywords identified in 
the first document that also appear in a PS dictionary: 

(c) receiving a first list of rated keywords extracted from the first document and a 
second list of rated keywords extracted from the second document; 

(d) using the first and second lists of rated keywords to determine whether the 
first document forms part of the second document using a first computed percentage 
indicating what percentage of keyword ratings in the first list also exist in the second list; 

(e) verifying inclusion of the first document in the second document computing a 
second percentage indicating what percentage of keyword ratings along with a set of 
their neighboring keyword ratings in the first list also exist in the second list when the 
first computed percentage indicates that the first document is included in the second 
document; 

(f) using the first computed percentage to specify the measure of similarity when 
the second computed percentage is greater than the first computed percentage; 
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(g) if the first computed percentage does not indicate that the first document is 
included in the second document, computing a third percentage using the Jaccard 
distance measure; and 

(h) if the third computed percentage indicates that the first document is a revision 
of the second document, computing a fourth percentage indicating what percentage of 
keyword ratings along with a set of their neighboring keyword ratings in the second list 
also exist in the first list, the fourth computed percentage is used to specify the measure 
of similarity except when: (i) the fourth computed percentage is greater than the second 
computed percentage; (ii) the first list of rated keywords is identified using OCR; (iii) the 
fourth computed percentage is greater than fifty percent; and (iv) less than twenty 
percent of the keywords in the first list of keywords are in the second list of keywords. 
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